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Abstract 

This paper considers the model of an arbitrarily distributed signal x observed 
through an added independent white Gaussian noise w, y = x + w. New relations 
between the minimal mean square error of the non-causal estimator and the likelihood 
ratio between y and w are derived. This is followed by an extended version of a 
recently derived relation between the mutual information I{x] y) and the minimal 
mean square error. These results are applied to derive infinite dimensional versions 
of the Fisher information and the de Bruijn identity. A comparison between the causal 
and non-causal estimation errors yield a restricted form of the logarithmic Sobolev 
inequality. The derivation of the results is based on the Malliavin calculus. 
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1 Introduction 



Let Wt, < t < T denote the standard ci-dimensional Wiener process and w'^ 
the related white noise. The white noise channel is, roughly speaking, defined by 
y'(t) = x'{t) + w[, < t < T where x'(t) is a signal process independent of the 
white noise process w[. In the context of detection theory, the key entity is i{y), 
the likelihood ratio, i.e. the Radon-Nikodym derivative of the measure induced by 
the {y'(t), t G [0,T]} process with respect to the measure induced by the white 
noise {w[,t G [0,T]}. In the context of filtering theory the key entities are the 
causal and the non causal estimates, i.e. the conditional mean E{x[\y'^^ < r] < t) 
OT E{x[\y'^, < 1] < T) respectively. In addition to this pair of random entities, 
there are also averaged entities such as the averaged minimal errors which amount 
to J^E{x[ - E{x',\y'^, T] e [0,t])ydt, or E{x[ - E{x[\y'^, r] E [0,T])fdt; and on 
the other hand, the mutual information between the paths {y = {y.r^, rj E [0, T])} and 
{x = {xrj, T] e [0,T])}, i.e. 

^ d{P{x)P{y)) 

where the expectation is w.r. to the P{x; y) measure, and also the relative entropy 
E£{y). Relations between the likelihood ratio i{y) and the causal conditional ex- 
pectation were discovered in the late 60 's and this was soon followed by a relation 
between the mutual information and the causal mean square error ^U] , [SI , [HI • These 
relations which involved causal mean square errors were based on the Ito calculus. 
Similar problems for the non causal estimator were also considered 0; [El- The formu- 
lation and results in the non causal case were restricted to the finite dimensional time 
discrete model of the Gaussian channel. Recently, however, Guo, Shamai and Verdii 
(GSV) [7j applied information theoretic arguments to derive new interesting results 
relating the mutual information with non causal estimation in Gaussian channels. 

The Ito calculus which has proved to be a powerful tool for the relations associ- 
ated with causal estimation could not be applied to problems related to non causal 
problems which explains the slow progress in the direction of relations for non causal 
estimates. However, the development of the Malliavin calculus, namely, the stochas- 
tic calculus of variation which was introduced in the mid 70's led in the early 80's to 
results which prove to be a very useful tool for the non causal type of problems. 
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The purpose of this paper is to apply the Malhavin calculus in order to derive 
the extension of the finite discrete time relations between non causal estimation and 
likelihood ratios to continuous time (section 4), and to prove an extended version of 
the results of [7j relating the mutual information with causal estimation error (sec- 
tion 5). The modelling of the additive Gaussian channel on the abstract Wiener space 
(in contrast to the ci-dimensional Wiener process on the time interval [0,T]) yield in 
sections 4 and 5 results of wide applicability, e.g. for the filtering and transmission 
of images and random fields. The relation of these results to the de Bruijn identity, 
causal filtering and the logarithmic Sobolev inequality are discussed in sections 6 and 
7. 

In the next section we define the Abstract Wiener Space which generalizes the 
classical d-dimensional Wiener process, and formulate the additive Gaussian channel 
which will be considered in the paper. Also, the problems considered in sections 4 and 
5 are outlined in this section. Section 3 is a very short introduction to the Malliavin 
calculus. Section 4 presents the results relating likelihood ratios (R-N derivatives) 
with non-causal least square estimates cf. remark 2 in section 4 for possible applica- 
tions of these results to nonlinear filering. In section 5 we derive an extended version 
of the GSV results. These results are applied in section 6 to consider the notions 
of Fisher information and the de Bruijn identities in an infinite dimensional setup. 
Section 7 deals with abstract Wiener spaces endowed with a time parameter. This 
enables the comparison of results for causal estimations with corresponding results for 
non-causal estimation. It is shown that a restricted form of the logarithmic Sobolev 
inequality follows directly from the results derived in this paper. 

Acknowledgement: We wish to express our thanks to Shlomo Shamai for call- 
ing our attention to the problems considered in this paper and providing us with 
a preliminary version of [Zj, and to Suleyman Ustiinel and Ofer Zeitouni for useful 
comments. 
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2 The underlying Wiener space and the additive 
channel model 



A. Consider, first, a standard one-dimensional Wiener process on [0, 1], say w{t),t G 
[0,1]. Let {rii(t),i = 1,2, ■■■ ,t G [0,1]} be a complete orthonormal base on [0,1]. 

Set ei{t) = jQrii{s)ds, then Yli lo Vi{^)(^''^s ■ ^{{t) converges to w{t) in quadratic 
mean. We will denote the sequence of independent Gaussian, identically distributed 

(i.i.d.) random variables | r]i{s)dw{s) = /J ^^^fjf^dw^s) i = 1, 2, ■ ■ ■ | by 5ej, i = 

1,2, Then 

oo 

w{t) = J2^iei)e,{t). (2.1) 

1 

Now {ei{t),t G [0, 1], 2 = 1, 2, ■ ■ ■ } can be considered as a C.O.N, base of an Hilbert 

space H of functions h{t),t G [0, 1] with scalar product {hi, h2)H = ^^IIJ^^^III^'^'^- 
This space H is known as the Cameron Martin space. Note that the Wiener process 
which is continuous but not differentiable is not an element in H . The same notation 
goes over to the case of the d-dimensional Wiener process with w(t),ri(t),h(t),e(t) 
taking values in M."^ and 

E6e = 0, E{5ef = 5s . (2.3) 

In this model we will consider y{t) = x(t) +w(t) where {(t),t G [0, 1]) takes values in 
the Cameron-Martin Space H and w{t), the Wiener process takes values in the space 
of valued continuous function considered as a Banach space W under the norm 

\w{t),te [0,l]\w = SUPig[o_i] \w{t)\Kd. 

In addition to the Banach space W we have to consider the space W* of all 
continuous functionals on W and it can be shown that W* is a dense subspace of H 
(cf. e.g. [15). Hence for e G W*, it also holds that e e H and "e G W* operating on 
w G W" is the stochastic integral 

(w,e) =6e. (2.4) 



An abstract model for the Wiener process in terms of the spaces W, W*, H and the 
Wiener measure fiw is considered in the next subsection. The reader can skip this 
step by interpreting the triplet (W, H, fj,w) as the d-dimensional Wiener process as in 
equations (Q- 

B. The Abstract Wiener Space (AWS) is an abstraction of this model where is a 
separable Banach space and H, the Cameron-Martin space, is an Hilbert space densely 
and continuously embedded in W. The dual space to W (the space of continuous 
linear functionals on W) is denoted W* and assumed to be continuously and densely 
embedded in H. The Abstract Wiener Space {W, H, fi^) supports a ly-valued random 
variable w such that for every e G W*, 6e := {w,e) ^ is a A^(0, |e|^) random 

variable. Cf. e.g. ^7], or appendix B of ^W. and the references therein, for further 
information on the AWS. Note that, unlike the classical case, the Abstract Wiener 
Space does not have any time-like parameter (this however can be added cf. section 7). 

C. In order to introduce the general setup of the additive Gaussian channel, let 
(W, H, fiw) be an abstract Wiener space and let {H, a{H), fix) be a probability space 
on the Cameron-Martin Hilbert space H which is induced by an H-valued r.v. X. Let 
9 = {x,w),x E H and w G W, set 9 = {9} and consider the combined probability 
space 



which is the space of the mutually independent 'signals' x and 'noise' w. Now, since 
H is continuously embedded in W we can identify x with its image in W and defined 
the additive Gaussian channel as 



where p is a free scalar 'signal to noise' parameter which will become relevant in 
Section 5. We will denote by X and Y the sigma fields induced on W by the r.v.'s x 
and y respectively. Note that y and w are W valued, x is H valued and we identify 
x with its image in W. In fact we will make throughout this paper, just for reasons 
of simplicity, the additional assumption that x is W* valued. As mentioned earlier, 
since W* C H C W we can also consider x to be or 14^ valued. 




(2.5) 



y{9) = px + w , 



(2.6) 
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In section 4 we will be interested in the relation between two types of objects. 
The first class of objects is 

E(^{x,e)^\Y^ and Ci)^ ■ (x, 62)^ l^j 

for e, 61,62 G W* or globally 

^ = E{x\Y) and {x,x)^ = e(^{x,x)^\Y^ . (2.7) 

The second class of objects are the likelihood ratio (the R-N derivative) between the 
measures induced by y and the one induced by w on W. This likelihood ratio will 
be denoted i{w), w G W. Note that if W is infinite dimensional then the measure 
induced by x is singular with respect to the measure induced by w, (since x E H 
while w ^ H). 

In section 5 we will consider the relation between /(X; Y) or rather dI{X; Y)/dp 
and the non-causal filtering error: 




A related result for d{E log i{w))/ dp is considered in section 6 and shown to be an 
extended version of the De Bruijn identity. 



3 A short introduction to the Malhavin calculus 



For further information cf. e.g. jTTj, JH], or appendix B of 

(a) The gradient 

Let {W,H,p) be an AWS and let Ci,? = 1,2,... be a sequence of elements in 
W*. Assume that the image of Cj in H form a complete orthonormal base in H. Let 
f{xi, . . . , Xn) be a smooth function on M" and denote by // the partial derivative of / 
with respect to the i-th coordinate and let 5e be as discussed in the previous section. 

For cylindrical smooth random variables F{w) = f{S6i, . . . , Sen), define 

\^^P — dF{w-ireh) 



de 



. Therefore we set the following: V/jF = (VF, h) where VF, the 

e=0 
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gradient, is H-valued. For F{w) = 6e, VF = e, and 



= (3.1) 



i=l 



It can be shown that this definition is closable in L^ifi) for any p > 1, which means 
that it can be extended to a wider class of functional as we will see below. We 
will restrict ourselves to p = 2, consequently the domain of the V operation can be 
extended to all functions F{w) for which there exists a sequence of smooth cylindrical 
functions such that F^ — > -F in L2 and V-F^ is Cauchy in L2(/U, H). In this case 
set VF to be the L2{fi,H) limit of VF^. This class of r.v. will be denoted D2,i- It 
is a closed linear space under the norm 



l2,i = EUFr + E-^\VF\l. (3.2) 

Similarly let K be an Hilbert space and ki,k2, ■ ■ ■ a complete orthonormal base in K. 
Let (fi be the smooth K-valued function (p — Yl]Li fji^^i^ ■ ■ ■ ■> ^^n)kj define 

m n 

^'^ = E ■ ■ ■ ' ^''-)ei (8) kj (3.3) 

j=i i=i 

and denote by D2,i(i^) the completion of V(p under the norm 

Note that this enables us to define recursively V'*F(w) for n > 1. 

(b) The divergence (the Skorohod integral) 

A few introductory remarks. Let v{x),x G M„ take values in R„, v{x) = Vi{x)pi, 
where the pi are orthonormal vectors in R„. Assume that the Vi and F{x) arc smooth 
and converge "quickly enough" to zero as \x\ — > 00. Then the following "integration 
by parts formula" holds 

{v{x),V F{x))dx = — / F{x)divvdx, (3.5) 
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where div is the divergence: 



n 



dxi ' 



divf = 



1 



Note that the gradient and divergence are differential operations, and equation ()3.5p 
deals with integration with respect to the Lebesgue measure on M„. In this subsection 
we are looking for an analog of the divergence operation on M„ which will yield an 
integration by parts formula with respect to the Wiener measure. 

Let u{w) be an if-valued r.v. in (W,H,fi), u will be said to be in dom2 5 if 

2 

< ^ ^^'i there exists a r.v. say 5u such that for all smooth functionals 
f{Sei, . . . , 6en) and all n the "integration by parts" relation 



is satisfied. 5u is called the divergence or Skorohod integral. A necessary and sufficient 
condition for a square integrable u{w) to be in dom2(5 is that for some 7 = 7(m), 



for all smooth /. Note that while the definition of V/ (at least for smooth functionals) 
is invariant under an absolutely continuous change of measure, this is not the case for 
the divergence which involves expectation in the definition. For non-random h G W*, 
6h = {h,w), setting / = 1 in ()3.6|) yields that E6h = 0. It can be shown that if 

u G ©2.1 (-f^) then u G dom2 6. Also, for smooth f{w) it can be verified directly that 




(3.6) 



Eiuiw),\/f)^ <lE-^f{w) 




H 



and more generally under proper restrictions 




(3.7) 



Consequently, if E\u\ < 00, and Vm is of trace class then 
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where for an operator A on H and e^, i = 1, 2, . . . a CONB on H, define 

oo 

traced = ^^^(ei, Acj) 
1 

provided the series converges absolutely and in this case A is said to be of trace class. 
Among the interesting facts about the divergence operator, let us also note that for 
the classical Brownian motion and if 

= / u,{w)ds 



JO 

and u'g{w) is adapted and square integrable then 6u coincides with the Ito integral 
i.e. 6u = u'g{w)dws. 

(c) Let {W,H,fi) be an abstract Wiener space and let /ii be another probability 
measure on the same space (IV, crjiy}). Assume that /ii is absolutely continuous 
with respect to /i. Set 

i(w) = ^(w) and Q(w) = {w : £(w) > 0} 

Ct/i 

El and Eq will be used to denote the expectation with respect to the measures /ii 
and fi respectively. We will use the convention OlogO = throughout the paper. 

Following the definition in 3(b), we define the divergence with respect to fii to be 

as follows. The if-valued random variable u{w) will be said to be in dom2 6 if there 

exists a r.v., say 6u, which is under ui and such that for all smooth r.v.s f{w), it 
holds that 

The relation between 5u and 5u is given by the following lemma. 

Lemma 3.1 Assume that i{w) G ©2,1, u G dom2 S, i ■ Su & L2 and i ■ Vu G ©2,0 (-f^) 
and fii <^ fi]v (where ©2,0 (-f^) is the completion of ()3.3|) under the H-norm). Then 

u G (\.om\ 5 and 



5u = 1q(w)((5m - (v log i{w),u{w)) (3.9) 
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Proof: Since f{w) is a smooth r.v., i- f -Su — f(V f, u)^ is in Li and £{w)V \ogi{w) = 
Vi{w) a.s.-/i. Hence 

E^ (^f{w)6u - f{w) (V log m)^) =Eo{i-f-6u- if{V logi, u)^) 

= Eo{(V(£-/),n)^-/(V£,n)} 

4 Relations between the estimation error and the 
likelihood ratio 

Let {W, H, /i), {H, a{H),fix), (©, ^, V) and ?/(^) = + w be as in section 2. We 
will further assume that the if- valued r.v. x is actually W* valued, and exp a{x, h)^ E 
L^{fix) for all real a and all h G W*. The measures induced by y and x onW will be 
denoted fiy and /xx respectively. The conditional probability induced on W by y{9) 
conditioned on x will be denoted by ^y\x- Similarly, fix\Y will denote the conditional 
probability induced on W* of x conditioned on y (cf. e.g. 4] for the existence of these 
conditional probabilities). 

By the Cameron-Martin theorem (cf. e.g. ^H]) and since x and w are independent, 
we have 

^^'^ [w) = exp ( p{w,x) — ^rl^^L), w E W (4.1) 

which by our assumptions belongs to Lp for all p > 0. Hence, denoting by fix{dx) 
the restriction of P to if: 

i{w) = J^^ (w) = I ^^^^ {w, x)nx{dx) 



dfjjw J H dfiw 

I 

"2 



H 



exp ( p(w,x) - ^|a;|^ J px(rfa;) (4.2) 
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Proposition 4.1 Under these assumptions it holds that 
(a) 



{Vi,h)j^ =:Vhi{w) = pi{w){x,h)^, WheH hence: 

= 1 (4.3) 

Vi = p£{w)x or X = -V log i{w) 

a.s. pw Note that a denotes the conditional expectation conditioned on Y , equa- 
tion (ITTj) . 

(h) 



(V"£(u;), hi®---® h^^^^ = i{w)p^ m x)] (4.4) 



(c) in particular trace V i{w) exists and a.s. pw 



K,hA^) = P'^(^) ((^1' ^) • (^2, x)) (4.5) 

and 



Wl, log l{w) = p2 ( ((x, /.)2) - ( X , hf) (4.6) 

where V^^^^^y? =: (V(V(/?, /12), /^i)^, c/. a/so (gl)) 
(d) 



rt— 1 71—1 



1=1 1 j=i 



Remark 1: Let Ei denote the measure induced by y on and let E denote 
expectation w.r. to the measure in ()2.5|) . For an operator A on if and Cj, 2 = 1, 2, . . . 
a CONB on define 



traced = y~^(ei, Acj) 
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provided the series converges. Consequently, we have from ()4.6|1 and ()4.3|1 that 



El trace log = p^E\x - x 1 = p^ LElxl ^ - ^ 

H \ H H 



p2E|x|^-E|Vlog£H|^ 



(4i 



(c.f. also equations ()6.3|1 and ()6.4|1 ). 



Remark 2: (a) Consider the case where the abstract Wiener space is a classical 
Wiener space M", then m e is of the form u'{s)ds, x e if is of the form x'{s)ds 

where x'{s) G Mri and \x' {s)\'j^^ds < oo. Further assume that E \x'{s)\'^^ds < oo 
and x'{s) is a.s. continuous on [0,T]. Then given some t G [0,T], one can consider a 
sequence of linear functionals hn such that {hn,x) converges in to x{t,w) and ex- 
tend the results of proposition 14. II to Xt-= E{x'{t)\Y) for any t E [0, T]. (b) Consider 
equation ()4.2j) . given w E W we can replace the integration with respect to jj,x{,dx) 
with a Monte-Carlo approximation. Similarly we can replace (V"£(w), hi® ■ ■ ■ ® hn) 
by applying V" to the integrand of equation ()4.2j) and then replacing again the in- 
tegrand with a Monte-Carlo approximation. This can then be applied to derive a 



numerical approximation to 11"-! (^«'^) non-adapted non-linear fitlering of 

Y\^^i{hi,x). We will not follow these directions. 

The following lemma will be needed in the proof of Proposition 14.11 

Lemma 4.1 Assume that fiy o,nd ^y\x ore absolutely continuous with respect to fiw 
then for all bounded and measurable functions ip on <d 



djJ'Ylx /J X /J X f .djj, 



ipix,y)-j^{y,x)iJ,xidx)xij,widy) = I i){x,y)^^-^{y)fix\Y{dx]y)^iw{dy) 

XxW Ufiw JxxW d^iw 

Proof of Lemma: Let 



L= ilj{x,y)iJ,x,Yidx,dy) 

IXxY 
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Then, by Fubini's theorem 

ip{x^ w)^Y\x{dw, x)^x{dx) 

XxY 

'ip{x,y)^^p^{y,x)nx{dx)^w{dy) . 

XxY (^IJ'W 

Since the conditional probability fJ,x\Y is regular (cf e.g. theorem 10.2.2 of we 
also have 

L= ip{x,y)fix\Y{dx,y)fiY{dy) 

J XxY 

i'ix, y)'^^{y)iJ,x\Y{dx, y)nw{,dy) . □ 

XxY "/^VK 

Proof of Proposition: From ()4.ip and ()4.2p and since by our assumptions we may 
(by dominated convergence) interchange the order of integration and differentiation 

V,,H^/^,(V,.(...))e.p(p(...>4wi) 

p(/;,, x)exp {p{w,x) - yl^^l^ j fJ'x{dx) 



H 



p{h, x) ^^^^^ {w, x)px{dx) 



Thus, by Lemma (4. II 



Vhi{w)= / p{h,x)—^{w)fix\Y{dx,w) 
Jx (^f^w 

= pi{w){h, x) 

proving ()4.3j) . The same arguments also hold for repeated differentiation 

V"£(w;), /ii ® ■ ■ ■ /i„) = p" / (hi,x) . . . (hn,x) ^^^^^ (w,x)pxidx) 

J H®^ Jx dpw 
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which yields ()4.4j) . ()4.5|1 follows directly from ()4.4jl since 



K,kA^) = P'^(^) ((^1' ^) ■ ■ x)) (4.9) 

therefore 

l[W) {l[W))'^ 

^ VlJ{w)-\V,\og£{w)\' (4.9a) 



i{w 

proving ()4.5jl and (j4.6|l . From ()4.4j) we have 



n-l 



and dHH) follows. □ 

We conclude this section with some results for Sx and Sx (cf. part (c) of sec- 
tion 3). By the assumptions of this section x G dom2 5 and x G dom^d. Therefore 
by (Q 

6x= -5V log i{w) . 
P 

Note that C = (5V is the number operator, i.e. if a{w) is a square integrable r.v. of 
the Wiener space and a = ^^^^ In, where In is the Wiener chaos decomposition of 
x; then, formally, Ca = J2n=i '^-^n- Therefore if Ca{w) G L2 and E{Ca{w)) = then 
C'^Ca is well defined, consequently it holds by equation (j4.3p that 

i{w) = c-exppC-^6W . (4.10) 

where c is a normalizing constant. For 6x we have 

Lemma 4.2 
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Proof: By (JH 



5V£ = pS{£{w)x) 

= pi{w)6x — p{x , V£)^ 
= p£{w)6x — p^i{w){x , X 



H 



and 



Hence by Lemma (3. II 



6x = —^6Vi{w) + p{x, x) 



5 X = 5 X — ( V log i{w), X )^ 



5 X — p\x''^ 



H 



a{w) 

pi{w) 



□ 



5 The GSV relation between the mutual informa- 
tion and the mean square of the estimation error 

Consider the setup and assumptions in the first paragraph of section 4. The 
mutual information between x and y is defined as 



I{X;Y)= ! log 

J XxW 



dpX;Y 

d{px X Py) 



x,y) px,Y{dx,dy) . 



E will denote expectation w.r. to the measure in (j2.5|) . (cf. e.g. ^|). Eq will denote 
expectation w.r. to the Wiener measure and Ei will denote expectation w.r. to the 
measure on W induced by y (hence Ef\y) = Eif{w) = EqI{w) f {w)) . 



Proposition 5.1 Under the assumptions of the previous section, it holds that 

dI{X; Y) _ ^ 
dp 



1 |2 













(5.1) 



= i2 

pE\x — X 



H 
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Proof: By our assumptions, and since ^(^^^'^y) = "^i^ ' 'we have 
I{X;Y)= [ \\og^{x,y) ~\ogp^{y)] f^{dx,dy) 
= E (^p{y,x) - ykl^^ - E\ogi{w) . 

Note that Ep{y,x) = p'^E\x\^, hence 

2 

I{X-Y) = ^E\x\l-E,logi{w) (5.2) 



and 



dliX- Y) _ ^^|^|2 _ d^^^^^^ \ogi{w) (5.3) 



dp ^ dp 

'di{w) 



pE\x\l-Eo(^- 



dp 



\ogi{w) -0. (5.4) 



Now, 



dp Jx^ dpiv 



By lemma ITT] 



^ = l^{{x,w)-p\xQ^{w)px\y{dx) 



{x,w) - p (|x|p ^^{w) . 
Substituting in ()5.4p yields 

j-p = /'^I^Ih ~ ^0 -p (|x|^) )£(«;) log£(w) 



(5.5) 
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Now, by dOl) 



Eo£log£{x,w) = Eo { - \ogi{Vi,w) 



P 



by 

by lO 
by ICT and lOl 



Eo - logff^V^ + trace 
P ^ 

Eo - 6(logiVi) -Eo - (vi,V\ogi) +Eo- (log £ trace 
P P ^ ^ P 

0-^Eoj{Vi,Vi) + Eop(^i log i i\x\l) ^ 



= pEoi{w){x , x) + Eopi{w)logi{w) {\x 

Substituting into ()5.5|1 yields 
dl 



I') 



dp 



pE\x\^ - pE\ x\^ + Eopilogi - Eopiloge {\x\'^: 



□ 



Remark (a): Eilog i{w){= Elogi{y)) is the relative entropy (or /-divergence or 
KuUback-Leibler number) of py with respect to pw (cf. e.g. [12] or |2]). Equation 
fl5.2|) relates this relative entropy to the mutual information I[x] y) for the additive 
Gaussian channel. By equations ()5.Hj) and 

^ El log i{w) = pEi\Wf^. (5.6) 



dp 



Remark (b) : Consider the following generalizations to the additive Gaussian chan- 
nel. Let M be "the space of messages which generate the signals" x, i.e. (M, i3^,P^) 
is a probability space and x = g{m) , m G M, where g is a measurable from (M, i3^) 
to H. Then obviously I{M, Y) = I{X; Y). More generally, consider the case where x 
and m are related by some joint probability on M x H and w and m are condition- 
ally independent conditioned on x. The extension of proposition 15.11 in this context 
follows along the same arguments as in theorem 13 of (Tj and therefore omitted. 
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6 An extended version of the De Bruijn identity 



The Fisher information matrix J associated with a smooth probabihty density 
p{yi, j/n), 1/ e Mn is defined as 



J 



(9^ logp(yi, ■ ■ ■ 



and then the Fisher information which is defined by the r.h.s. of ()6.ip satisfies: 

2 



trace J = —E 



Vlogp 



(6.1) 



where E is the expectation with respect to the p density. The De Bruijn identity 
(cf. [3^ or [T] and the references therein) deals with the case where y = x + \/tw 
where w = Wi, W2, . . . ,Wn and the Wj , j = 1, . . . ,n are i.i.d. A^(0, 1) and x is an R„ 
random variable independent of w. It states that 

d , ^ 1 r 2 



E\ogp{y) = -E\ V\ogp{y) }. (6.2 



dt 

The Fisher information matrix cannot be extended directly to the case where y is 
infinite dimensional. However, the results of sections 4 and 5 yield some similar 
relations. Under the assumptions of section 5, comparing (|5.1|) with (|5.3|) we have 

^E, \ogi{w) = pEilW^H 

= -E,\V\ogi{w)\l, (6.3) 
P 

which is "similar" to ()6.2j) and may be considered an extended De Bruijn identity. 
Note that Ei\ogi{w) is the relative entropy of /ly relative to pw, also, note the 
difference between the p and the t parametrizations. 

Comparing (jS.ip with ()4.8|) yields 

El trace log £(ti;) =p'^^^^'^'' 



dp 

p^E\x\l-E^\V\ogi{w%, (6.4) 
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which is different from ()6.1|1 by the p'^E\x\'jj term. Note that the vahdity of ()6.4j) is 
restricted to the case where i{w) is induced by a signal plus independent noise model 
and not for any i{w) which is a negative r.v. and whose expectation is 1, cf. the 
concluding lines of the next section. 



7 Adding a "time parameter" to the abstract Wiener 
space 

Given an abstract Wiener space (W, H, /i) we can introduce the notion of continuous 
time on this space as follows. Let {ng, 0<6'<l}bea continuous , strictly increasing, 
resolution of the identity on H with ttq = 0, tti = /. Set J-'g = aldngh, h G H} and JF. 
will denote the filtration induced by J^g on [0, 1]. An if- valued r.v. u{w) will be said 
to be adapted to J^. if (m, irg, h)^ is J^g measurable for a\\ h & H and every 6 G [0, 1], 
(cf. section 2.6 of ^H] for more details). Let D2(-ff) denote the class of if- valued u{w) 

2 

such that Ei\u\ < oo. The class of adapted square integrable random variables is 

a closed subspace of 32(H) and will be denoted by ©2(-^)- We will denote by u the 
projection of -u G D2(ii') on ©^(if), i.e. 

2 2 

Ei\u — u\ = inf Ei\u — v\ (7.1) 

(this corresponds to the dual predictable projection in martingale theory). Since x is 
independent of w, and since x = Ei{x\a{y)), then 

x = (f)" (7.2) 

i.e. if X is not measurable on the cr-field induced by y, project, first, x on the cr-field 
generated by y and then project on Bi2{H), which is the same as replacing u with x 
in JZIH). Then (cf. e.g. H) 



i{y) = exp (^p5x - yl^l^^ • (7.3) 

By the same arguments as in [Hj or jHj and by the assumptions of proposition 15.11 

IiX;Y) = ^E\x-x\l. (7.4) 
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Remarks: (a) The left hand side of ()7.4|) is independent of the choice of tt. while x 

2 

does depend on the choice of vr.. Consequently, by ()7.3|1 . E\x — x\^ is independent of 

the particular choice of vr.. (b) The validity of ()7.3|) and ()7.4|) is not restricted to the 
case where x is independent of w (cf. [9J and [18j). 

By equations (lOl and (17^ . 

Ei\ogi{w) = ^E 





1, 






\x 











^ E'f— Ixl" +2(a;,x 



Hence 



2 

2 



£;ilog£H = ^E|x|^. (7.5) 



We conclude the paper with the following remark. Obviously 



E\x^^<E\xt^. (7.6) 



Hence by jT^)) and 



^ilog£(w;) < ^Ei|Vlog^(w)|J^ 

or 

i?o^(«;)log£(«;) < lEo^(«;)|Vlog£(u;)|J^ . (7.7) 
Setting p{w) = ci{w), c > then 

Eof{w)log\f{w)\<Eof-logElf + Eo\V\ogf\l (7.8) 

which is the logarithmic Sobolev inequality of L. Gross on Wiener space (cf. e.g. 
section 9.2 of [201 and the references therein). Note, however, that (j7.8p is not the 
complete logarithmic Sobolev inequality since as derived above, it holds only the for 
the case where £{w) is the likelihood ratio associated with x + w where x and w are 
independent (and not for any nonnegative i{w) for which Ei{w) = 1 cf. pij). 

Inequality ()7.8p follows from the obvious inequality ()7.6p and the equalities derived 
earlier in this paper. The question arises whether a similar argument can yield ()7.8j) 
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without the restriction on i{w) to be generated by a signal plus independent white 
noise. This seems to be a delicate problem; the left hand side of ()7.7j) can be shown 
to be equal to the left hand side of ()7.6|) without the restriction that the signal and 
noise are independent (P). However it is not clear if the right hand side of ()7.7|) and 
fl7.6|l are equal. 
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